Highest Probability Svm Nearest Neighbor Classifier for Spam Filtering

نویسندگان

  • Enrico Blanzieri
  • Anton Bryl
چکیده

In this paper we evaluate the performance of the highest probability SVM nearest neighbor classifier, which is a combination of the SVM and k-NN classifiers, on a corpus of email messages. To classify a sample the algorithm performs the following actions: for each k in a predefined set {k1, ..., kN} it trains an SVM model on k nearest labelled samples, and uses this model to classify the given sample, then fits a sigmoid approximation of the probabilistic output for the SVM model, and computes the probabilities of the positive and the negative answers; than it selects that of the 2 × N resulting answers which has the highest probability. The experimental evaluation shows, that this algorithm is able to achieve higher accuracy than the pure SVM classifier at least in the case of equal error costs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of the Highest Probability SVM Nearest Neighbor Classifier with Variable Relative Error Cost

In this paper we evaluate the performance of the highest probability SVM nearest neighbor (HP-SVM-NN) classifier, which combines the ideas of the SVM and k-NN classifiers, on the task of spam filtering. To classify a sample, the HP-SVM-NN classifier does the following: for each k in a predefined set {k1, ..., kN} it trains an SVM model on k nearest labeled samples, uses this model to classify t...

متن کامل

Instance-Based Spam Filtering Using SVM Nearest Neighbor Classifier

In this paper we evaluate an instance-based spam filter based on the SVM nearest neighbor (SVM-NN) classifier, which combines the ideas of SVM and k-nearest neighbor. To label a message the classifier first finds k nearest labeled messages, and then an SVM model is trained on these k samples and used to label the unknown sample. Here we present preliminary results of the comparison of SVM-NN wi...

متن کامل

E-mail Spam Filtering with Local Svm Classifiers

This paper describes an e-mail spam filter based on local SVM, namely on the SVM classifier trained only on a neighborhood of the message to be classified, and not on the whole training data available. Two problems are stated and solved. First, the selection of the right size of neighborhood is shown to be critical; our solution is based on the estimation of the a-posteriori probability of the ...

متن کامل

Improving spam filtering by combining Naive Bayes with simple k-nearest neighbor searches

Using naive Bayes for email classification has become very popular within the last few months. They are quite easy to implement and very efficient. In this paper we want to present empirical results of email classification using a combination of naive Bayes and k-nearest neighbor searches. Using this technique we show that the accuracy of a Bayes filter can be improved slightly for a high numbe...

متن کامل

A Novel Method for Detecting Spam Email using KNN Classification with Spearman Correlation as Distance Measure

E-mail is the most prevalent methods for correspondence because of its availability, quick message exchange and low sending cost. Spam mail appears as a serious issue influencing this application today's internet. Spam may contain suspicious URL’s, or may ask for financial information as money exchange information or credit card details. Here comes the scope of filtering spam from legitimate em...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007